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1. INTRODUCTION 

Feature selection is one of the essential methods in machine learning. The use of a dataset without 
adequate features makes prediction impossible. Conversely, using all features may also be impossible since 
the amount of available training data in accordance dimensionality is small [1]. Even though feature selection 
tends to cause biases when handling missing data [2], it can handle uncorrelated or redundant features, 
which improves prediction performance [3]. There are two types of feature selection, filter and wrapper 
technique [4, 5]. Depending on the characteristic of data, the filter technique evaluates features without using 
any classification algorithms [6] and is utilized for high dimensional data [7]. However, the wrapper 
technique utilizes a specific classifier to evaluate the quality of the selected feature and its subset effect on 
the algorithm performance [4, 8]. 

According to [9], the most standard filters are based on their predictive power, which is approached 
by several means such as Fisher score [10], Chi-Square test [11], Laplacian score [12], Pearson correlation [13], 
or mutual information [14]. Conversely, wrapper feature selection is one of the most common and practical 
techniques [15]. The ant colony algorithm with an artificial neural network [16], a genetic algorithm with 
k-nearest neighbors [17]. Binary PSO and mutation algorithm with decision tree [18] are the example of 
the wrapper method in feature selection. Feature selection reduces the dimension by eliminating 
inappropriate or redundant features. It contributes to making more improvements in the learning accuracy 
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of computational intelligence [19]. Furthermore, it is relatively significant because, with the same training 
data, it tends to perform better with different subsets [20]. 

Many researchers have developed new feature selection methods. The large margin hybrid 
algorithm for feature selection (LMFS) proposed by Zhang et al. [21] successfully overcome the over-fitting 
between the optimal feature subset and a given classifier. Yuan et al. [22] proposed partial maximum 
correlation information (PMCI) as a new feature selection method that delivers relatively good performance 
with lower time complexity than others. LW-index with the Sequence forward search algorithm (SFS-LW), 
proposed by Liu et al. [23] obtained similar accuracy as the wrapper method. 

Meanwhile, Chiew et al. [24] proposed the hybrid ensemble feature selection (HEFS) as the feature 
selection for machine learning-based phishing detection system that is highly desirable and practical. 
There was also a method known as the curious feature selection (CFS) which is motivated by artificial 
curiosity and positively impacts the accuracy of the learning model [25]. Moreover, the possibility to 
improve and developed a new feature selection is still an appealing issue. The kernel function is known as 
the function that commonly used in the machine learning method to separate the data linearly when the data 
cannot be linearly separable. In this paper, therefore, introduces a new algorithm for feature selection 
based on kernel. K-means clustering [26] was used to examine its performance by calculated accuracy 
and F1-Score. 


2. PROPOSED METHOD 

This research introduces a new feature selection algorithm based on kernel with three steps: 
we calculate the mean of features, apply the kernel function, and sort the feature importance. 
Let X = {C,,C,,...,C,} is a set of k classes that consists of n samples of the dataset with f features in which 
X = (X1, X2, =, Xf) © Cy and |C,| = ny. From the above-listed values, the mean of each feature in every class 
is computed. It provides the sense to understand and obtain its representative value. Consider the mean 
of f features in the k-th class as a vector My = (X1, X2, mo r). These k vectors are then used to construct 
K by F matrix M = [m Mm, > mki. 

After that, the kernel transformation is performed on every pair of mean vectors m;, m; where i + j 
by projecting them into high dimensional feature space using the function as follows: 


k(m; mj) : XxX > F (1) 


This research utilizes two kernel functions, namely Gaussian radial basis function (RBF) and polynomial 
kernel functions with several kernel parameters. The formulas are shown in (2)-(3). 


2 
RBF kernel function: k(m,,m;) = exp (- Inc) (2) 
‘ : h 
Polynomial kernel function: k(m;,m;) = (m; "mj + 1) (3) 


The result of this transformation is then stored as kernel matrix as given in (4): 
K = [k(m, m,)| = ky (4) 


In addition, the feature importance depends on this kernel matrix. Finally, the total entries of every 
row or the total number of kernel representation of the mean are computed. It is calculated using (5): 


Spe Dakys US 12eaf (5) 
Its value is then decreasingly sorted, which shows the order of features used represents the feature 


importance of the dataset. After that, the order of these features is considered in performing feature selection. 


3. RESEARCH METHOD 
3.1. Dataset 

In these experiments, 16 real-world datasets from UCI data repository [27] are utilized to examine 
the performance of the proposed method with details summarized in Table 1. 
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Table 1. The real-world dataset characteristic 








Dataset Number of samples | Number of features 
Iris 150 4 
Thyroid disease 215 5 
Credit score 100 6 
Breast cancer Wisconsin (BCW) (Diagnostic) 569 30 
Glass identification 214 9 
Letter recognition 20000 16 
Statlog (Landsat satellite) 6435 36 
Wine 178 13 
Statlog (Vehicle silhouettes) 946 18 
Housing 506 13 
Machine 209 6 
Mammographic mass 961 5 
Seismic-bumps 2584 18 
Cardiotocography 2126 21 
Forest type mapping 326 27 
Image segmentation 2310 19 





3.2. Algorithm 

The new feature selection based on kernel consists of three steps: we calculate the mean of features, 
apply the kernel function, and sort the feature importance. The new feature selection algorithm based on 
kernel is given in Figure 1. This paper utilized only 75 percent of the first features after sorting the features 
which are used in the evaluation. K-means clustering, using 10-fold cross-validation is further used to 
examine the model by utilizing reduced features in the new feature selection algorithm. The k-means 
clustering algorithm is shown in Figure 2. 








Input: X = {C,,Cy,...,C,} where x = (x1, X2,...,X¢) © Cy and |C,| = ny 
Output: sorted features 
1. Calculate the mean of each class: my, = (X1, Xz, ---, Xp) 


2. Construct the matrix M=[M, M * m,]* 
3. Compute kernel matrix K = [k(m;, m;)] = k;j where i + j and k(m, mj) is calculated based on the kernel type that was 
used. 


4. Find the value S; = Dies kij with i = 1,2, ..., f , and sort this value decreasingly. The index of the sorted S; is the index of 
features that will be first used. 
End 

















Figure 1. Our new feature selection based on kernel algorithm 








Input: X = {x4, Xz, =, Xn} C, Mi, my, €,T (the maximum number of iterations allowed). 
Output: V = {v,,v2,... Ve}, R = [rk] 1 Si <n, 1 <kc. 
1. Initialization: V° = {v1, V3, ... Ve} 
Compute the value of |x: =v | 
1 ,ifk= arg min||x; = A 


Update membership of the data point x; in k*”-cluster according to: Tz. = { j 
0 „otherwise 


WO _ krijt 


2 
3 
4. Update cluster center V* using the equation below. vj Sarg 
5 


If ||Vo-D — VO] < £ or T = t, then the iteration stops. Otherwise, t = t + 1 and go back to step 2; 
End 




















Figure 2. K-means clustering algorithm 


3.3. Performance metrics 

In evaluating the performance of our new feature selection based on kernel, we utilize confusion 
matrix respect to the result of k-means clustering. The confusion matrix consists of four possible outcomes: 
true positives (TP), false negatives (FN), true negative (TN), and false positive (FP) [28]. If the positive 
instance is correctly predicted, it is counted as a true positive. If not, it is called a false negative. 
Then if the negative instance is correctly predicted, it is counted as true negative. If not, it is called 
a false positive [29]. 

In this paper, the confusion matrix is used to compute the performance metrics such as accuracy 
and Fl-Score, where their formulas are as shown in (6)-(7): 
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TP+TN 
Accuracy = ————_—_ (6) 
TP+FN+TN+FP 
2 * sensitivity * precision 
F1 — Score = AEA (7) 
sensitivity + precision 
with sensitivity and precision is defined as given in (8)-(9): 
TP 
Sensitivity = 8 
Y = TP4FN (8) 
S TP 
Precision = (9) 
TP+FP 


4. RESULT AND DISCUSSION 
4.1. The performance of our new feature selection based on RFB kernel function 

In this section, the performance of k-means clustering was examined using the new feature selection 
based on RBF kernel function. Several kernel parameter o were utilized with the analysis of the result based 
on each performance measurement, as shown in Table 2. This table shows that the method used has excellent 
performance almost in all real-world datasets, with the majority obtained when o=1000 is used. In addition, 
the Machine dataset had the highest accuracy when o=0.0001. The accuracy is constant for every value 
of the kernel parameter for several datasets. Moreover, Fl-score performance is shown in Table 3. 


Table 2. The accuracy performance of our method on the real-world datasets using RBF kernel 








Dataset Kernel parameter of RBF kernel function 

0.0001 0.001 0.05 0.1 1 5 10 50 100 1000 
Iris 98.00 98.00 98.00 98.00 98.00 98.00 98.00 98.00 98.00 98.00 
Thyroid disease 98.14 98.32 98.38 98.42 9843 9845 9845 9846 98.47 98.47 
Credit score 94.44 96.11 96.67 96.94 97.11 97.22 97.30 97.36 9741 97.44 
Breast cancer Wisconsin 90.00 90.00 90.00 90.00 90.00 90.00 90.00 90.00 90.00 90.00 
(Diagnostic) 
Glass identification 94.29 94.76 94.92 95.00 95.05 95.08 95.10 95.12 95.13 95.14 
Letter recognition 97.19 98.32 98.76 98.99 99.13 99.22 99.28 99.33 99.37 99.40 
Statlog (Landsat satellite) 91.62 92.64 93.04 93.25 93.37 9346 93.52 93.56 93.60 93.63 
Wine 91.12 91.12 91.12 91.12 91.12 91.12 91.12 91.12 91.12 91.12 
Statlog (Vehicle silhouettes) 87.22 87.33 87.37 87.39 87.40 87.41 87.41 87.42 87.42 87.42 
Housing 85.42 85.52 85.55 85.57 85.58 85.59 85.59 85.59 85.60 85.60 
Machine 85.46 85.30 85.25 85.22 85.21 85.19 85.19 85.18 85.18 85.17 
Mammographic mass 75.33 75.33 75.33 75.33 75.33 75.33 75.33 75.33 75.33 75.33 
Seismic-bumps 80.85 80.85 80.85 80.85 80.85 80.85 80.85 80.85 80.85 80.85 
Cardiotocography 88.80 89.74 90.05 90.21 90.30 90.37 90.41 90.44 90.47 90.49 
Forest type mapping 90.07 92.80 93.79 9430 94.60 9480 94.95 95.06 95.14 95.21 
Image segmentation 96.42 97.29 97.64 97.81 97.92 97.99 98.04 98.08 98.11 98.14 





Table 3. The F1-Score performance of our method on the real-world datasets using RBF kernel 








Dataset Kernel parameter of RBF kernel function 

0.0001 0.001 0.05 0.1 1 5 10 50 100 1000 
Tris 98.04 98.04 98.04 98.04 98.04 98.04 98.04 98.04 98.04 98.04 
Thyroid disease 98.89 99.00 99.04 99.05 99.06 99.07 99.08 99.08 99.08 99.09 
Credit score 88.37 91.57 92.68 93.25 93.60 93.83 93.99 9412 94.21 94.29 
Breast cancer Wisconsin 87.71 87.71 87.71 87.71 87.71 87.71 87.71 87.71 87.71 87.71 
(Diagnostic) 
Glass identification 96.25 96.55 96.65 96.70 96.73 96.75 96.77 96.78 96.79 96.79 
Letter recognition 97.94 98.57 98.89 99.07 99.18 99.26 99.32 99.36 99.39 99.42 
Statlog (Landsat satellite) 92.26 92.93 93.21 93.36 9345 93.52 93.56 93.60 93.62 93.65 
Wine 91.66 91.66 91.66 91.66 91.66 91.66 91.66 91.66 91.66 91.66 
Statlog (Vehicle silhouettes) 84.96 85.14 85.20 85.23 85.25 85.26 85.27 85.28 85.28 85.29 
Housing 84.68 84.92 85.01 85.05 85.07 85.09 85.10 85.11 85.12 85.12 
Machine 83.78 83.60 83.54 83.50 8349 83.47 83.46 8346 83.45 83.45 
Mammographic mass 75.74 75.74 75.74 75.74 75.74 75.74 75.74 75.74 75.74 75.74 
Seismic-bumps 73.05 73.05 73.05 73.05 73.05 73.05 73.05 73.05 73.05 73.05 
Cardiotocography 90.74 91.28 91.47 91.57 91.62 91.66 91.69 91.71 91.73 91.74 
Forest type mapping 92.08 93.53 94.14 9446 9467 9480 94.90 94.98 95.04 95.09 
Image segmentation 96.78 97.45 97.72 97.87 97.96 98.02 98.06 98.09 98.12 98.14 
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As the measurement that concerns equally in sensitivity and precision, the Fl-Score performance 
of our method was also excellent. The best performance was obtained when kernel parameter o=1000 used. 
In addition, to the performance metrics above, the running time also was evaluated, and its result 
is summarized in Table 4. The result of the running time, which is calculated in second, varies regarding 
the value of kernel parameter. Except for the Letter Recognition dataset, the algorithm performs fast for 
almost all of the datasets. 


Table 4. The running time performance of our method on the real-world datasets using RBF kernel 








Dataset Kernel parameter of RBF kernel function 

0.0001 0.001 0.05 0.1 1 5 10 50 100 1000 
Iris 0.13 0.14 0.11 0.11 0.13 0.11 0.16 0.13 0.11 0.11 
Thyroid disease 0.25 0.23 0.22 0.22 0.25 0.23 0.33 0.22 0.22 0.22 
Credit score 0.05 0.06 0.05 0.05 0.03 0.05 0.06 0.06 0.05 0.06 
Breast cancer Wisconsin 1.30 1.31 1.30 1.33 1.34 1.31 1.33 1.36 1.50 1.73 
(Diagnostic) 
Glass identification 0.23 0.25 0.27 0.22 0.22 0.22 0.27 0.23 0.22 0.27 
Letter recognition 297.31 317.42 344.06 310.23 296.70 317.45 279.84 280.25 280.98 280.41 
Statlog (Landsat satellite) 11.05 10.95 11.02 11.00 11.05 11.13 11.39 11.34 11.03 10.94 
Wine 0.13 0.13 0.14 0.13 0.17 0.13 0.14 0.13 0.13 0.13 
Statlog (Vehicle silhouettes) 3.22 3.22 3.19 3.23 3.22 3.22 3.36 3.44 3.25 3.16 
Housing 1.13 1.09 1.20 1.11 1.09 1.08 1.09 1.11 1.20 1.13 
Machine 0.19 0.17 0.17 0.17 0.17 0.17 0.20 0.17 0.19 0.17 
Mammographic mass 1.83 1.84 1.81 1.81 1.84 1.86 1.81 1.83 1.81 1.88 
Seismic-bumps 1.16 1.13 1.13 1.14 1.14 1.13 1.13 1.13 1.16 1.11 
Cardiotocography 2.92 2.92 2.91 2.98 2.84 2.91 2.95 2.91 2.89 2.84 
Forest type mapping 1.27 1.28 1.30 1.25 1.23 1.31 1.30 1.27 1.28 1.28 
Image segmentation 22.13 22.14 22.13 22.42 22.31 22.14 22.17 22.09 22.19 22.39 





4.2. The performance of our new feature selection based on polynomial kernel function 

After evaluating the new feature selection performance using RBF kernel function, the new feature 
selection based on the polynomial kernel function in this section all evaluates the accuracy, Fl-Score, 
and running time. The accuracy performance is shown in Table 5. Opposite with the RBF kernel function, 
the accuracy performance of the new feature selection based on polynomial kernel is not affected by 
the polynomial degree. 


Table 5. The accuracy performance of our method on the real-world datasets using polynomial kernel 








Dataset Kernel parameter of polynomial kernel function 

1 2 3 4 5 6 7 8 9 10 
Iris 98.00 98.00 98.00 98.00 98.00 98.00 98.00 98.00 98.00 98.00 
Thyroid disease 98.51 98.51 98.51 98.51 98.51 98.51 98.51 98.51 98.51 98.51 
Credit score 97.78 97.78 97.78 97.78 97.78 97.78 97.78 97.78 97.78 97.78 
Breast cancer Wisconsin 90.00 90.00 90.00 90.00 90.00 90.00 90.00 90.00 90.00 90.00 
(Diagnostic) 
Glass identification 95.24 95.24 95.24 95.24 95.24 95.24 95.24 95.24 95.24 95.24 
Letter recognition 99.69 99.69 99.69 99.69 99.69 99.69 99.69 99.69 99.69 99.69 
Statlog (Landsat satellite) 93.89 93.89 93.89 93.89 93.89 93.89 93.89 93.89 93.89 93.89 
Wine 91.12 91.12 91.12 91.12 91.12 91.12 91.12 91.12 91.12 91.12 
Statlog (Vehicle 87.45 87.45 87.45 87.45 87.45 87.45 87.45 87.45 87.45 87.45 
silhouettes) 
Housing 85.62 85.62 85.62 85.62 85.62 85.62 85.62 85.62 85.62 85.62 
Machine 85.14 85.14 85.14 85.14 85.14 85.14 85.14 85.14 85.14 85.14 
Mammographic mass 75.33 75.33 75.33 75.33 75.33 75.33 75.33 75.33 75.33 75.33 
Seismic-bumps 80.85 80.85 80.85 80.85 80.85 80.85 80.85 80.85 80.85 80.85 
Cardiotocography 90.68 90.68 90.68 90.68 90.68 90.68 90.68 90.68 90.68 90.68 
Forest type mapping 95.83 95.83 95.83 95.83 95.83 95.83 95.83 95.83 95.83 95.83 
Image segmentation 98.35 98.35 98.35 98.35 98.35 98.35 98.35 98.35 98.35 98.35 





Meanwhile, F1-Score considers both sensitivity and precision are also similar for every polynomial 
degree, as shown in Table 6. Table 7 demonstrates the running time performance the method utilized. 
In addition, it still needs a long time for letter recognition dataset but performs well for other datasets. 
The performance also varied according to the polynomial degree used. 
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Table 6. The F1-Score performance of our method on the real-world datasets using polynomial kernel 








Dataset Kernel parameter of polynomial kernel function 

1 2 3 4 5 6 7 8 9 10 
Iris 98.04 98.04 98.04 98.04 98.04 98.04 98.04 98.04 98.04 98.04 
Thyroid disease 99.11 99.11 99.11 99.11 99.11 99.11 99.11 99.11 99.11 99.11 
Credit score 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 
Breast cancer Wisconsin 87.71 87.71 87.71 87.71 87.71 87.71 87.71 87.71 87.71 87.71 
(Diagnostic) 
Glass identification 96.86 96.86 96.86 96.86 96.86 96.86 96.86 96.86 96.86 96.86 
Letter recognition 99.68 99.68 99.68 99.68 99.68 99.68 99.68 99.68 99.68 99.68 
Statlog (Landsat satellite) 93.85 93.85 93.85 93.85 93.85 93.85 93.85 93.85 93.85 93.85 
Wine 91.66 91.66 91.66 91.66 91.66 91.66 91.66 91.66 91.66 91.66 
Statlog (Vehicle 85.32 85.32 85.32 85.32 85.32 $5.32 85.32 85.32 85.32 85.32 
silhouettes) 
Housing 85.18 85.18 85.18 85.18 85.18 85.18 85.18 85.18 85.18 85.18 
Machine 83.41 83.41 83.41 83.41 83.41 83.41 83.41 83.41 83.41 83.41 
Mammographic mass 75.74 75.74 75.74 75.74 75.74 75.74 75.74 75.74 75.74 75.74 
Seismic-bumps 73.05 73.05 73.05 73.05 73.05 73.05 73.05 73.05 73.05 73.05 
Cardiotocography 91.86 91.86 91.86 91.86 91.86 91.86 91.86 91.86 91.86 91.86 
Forest type mapping 95.54 95.54 95.54 95.54 95.54 95.54 95.54 95.54 95.54 95.54 
Image segmentation 98.33 98.33 98.33 98.33 98.33 98.33 98.33 98.33 98.33 98.33 





Table 7. The running time performance of our method on the real-world datasets using polynomial kernel 








Dataset Kernel parameter of polynomial kernel function 

1 2 3 4 5 6 7 8 9 10 
Iris 0.14 0.16 0.11 0.13 0.13 0.11 0.16 0.14 0.11 0.16 
Thyroid disease 0.23 0.23 0.22 0.22 0.30 0.27 0.23 0.25 0.25 0.22 
Credit score 0.06 0.06 0.08 0.05 0.05 0.05 0.06 0.05 0.05 0.06 
Breast cancer Wisconsin 1.33 1.56 1.34 1.50 1.34 1.50 1.34 1.56 1.34 1.42 
(Diagnostic) 
Glass identification 0.27 0.25 0.25 0.25 0.25 0.23 0.25 0.23 0.23 0.25 
Letter recognition 331.53 340.69 330.16 328.73 354.52 327.86 332.80 328.75 329.16 341.53 
Statlog (Landsat satellite) 12.48 12.67 12.66 13.25 12.73 12.52 14.05 13.36 13.19 13.91 
Wine 0.16 0.13 0.13 0.16 0.13 0.19 0.17 0.16 0.13 0.16 
Statlog (Vehicle 3.48 3.56 3.52 3.39 3.42 3.44 3.42 3.56 3.66 3.69 
silhouettes) 
Housing 1.20 1:22 1.23 1.17 1.19 1.25 1.17 1.17 1.17 1.17 
Machine 0.19 0.20 0.19 0.20 0.20 0.19 0.19 0.19 0.28 0.28 
Mammographic mass 1.88 1.91 2.16 2.08 2.19 1.92 1.92 1.92 1.98 1.97 
Seismic-bumps 1.19 1.91 1.19 1.17 1.22 1.22 1.22 1.20 1.20 1.25 
Cardiotocography 3.17 3.81 3.11 3.06 3.09 3.20 3.27 3.06 3.06 3.08 
Forest type mapping 1.38 1.34 1.39 1.50 1.38 1.38 1.56 1.50 1.38 1.36 
Image segmentation 23.16 25.02 24.20 23.14 23.41 22.98 23.66 22.75 23.84 24.44 





4.3. The comparison performance of our new feature selection based on RBF and polynomial kernel 
function with several other feature selection methods 

In this section, the performance metrics that consist of accuracy and Fl-Score is compared 
with the RBF and polynomial kernel function. From each dataset, their performance is extracted which 
delivers the best value. In the case of the polynomial kernel function that performs similarly for every 
polynomial degree, we choose the polynomial degree that performs faster in the running time. 
The comparison associated with the new feature selection is based on RBF and polynomial kernel function 
for every dataset. The performance of the proposed feature selection algorithm was also compared with 
the other well-established feature selection methods, such as Fisher score [10], Chi-Square test [11], 
and Laplacian score [12], as shown in Table 8. 

From Table 8, it can be concluded that both kernel functions perform similarly in almost 
every dataset that was evaluated. The running time is slower when using the polynomial kernel function. 
However, the polynomial kernel function is higher in the performance of accuracy and Fl-Score than RBF. 
Compared to the Fisher score, Chi-Square test, and Laplacian score algorithm as the feature selection, 
our proposed method was delivered higher accuracy and F1-Score until 40 percent difference, for example in 
the Credit Score, Letter Recognition, Statlog (Landsat Satellite), Forest Type Mapping, and Image 
Segmentation dataset. 
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Table 8. The comparison of the proposed method with Fisher’s score, Chi-Square Test, 


and Laplacian score algorithm 











Dataset Feature selection method Accuracy (%) Fl-Score (%) Running time (s) 
Iris New feature selection based on RBF kernel function with o = 0.05 98.00 98.04 0.11 
New feature selection based on 3rd polynomial kernel function 98.00 98.04 0.11 
Fisher Score 100.00 100.00 0.17 
Chi-Square Test 100.00 100.00 0.22 
Laplacian Score 100.00 100.00 7.03 
Thyroid disease New feature selection based on RBF kernel function with o = 1000 98.47 99.09 0.22 
New feature selection based on 3rd polynomial kernel function 98.51 99.11 0.22 
Fisher Score 100.00 100.00 0.28 
Chi-Square Test 100.00 100.00 0.36 
Laplacian Score 100.00 100.00 3.42 
Credit score New feature selection based on RBF kernel function with o = 1000 97.44 94.29 0.06 
New feature selection based on 4th polynomial kernel function 97.78 95.00 0.05 
Fisher Score 98.81 98.60 0.03 
Chi-Square Test 98.81 98.60 0.06 
Laplacian Score 100.00 100.00 1.86 
Breast cancer New feature selection based on RBF kernel function with o = 0.0001 90.00 87.71 1.30 
Wisconsin New feature selection based on 1st polynomial kernel function 90.00 87.71 1.33 
(Diagnostic) Fisher Score 87.50 90.91 0.20 
Chi-Square Test 87.50 90.91 0.25 
Laplacian Score 88.24 86.36 1.34 
Glass identification New feature selection based on RBF kernel function with o = 1000 95.14 96.79 0.27 
New feature selection based on 6th polynomial kernel function 95.24 96.86 0.23 
Fisher Score 98.46 98.26 1.03 
Chi-Square Test 98.46 98.26 1.19 
Laplacian Score 100.00 100.00 13.09 
Letter recognition New feature selection based on RBF kernel function with o = 1000 99.40 99.42 280.41 
New feature selection based on 6th polynomial kernel function 99.69 99.68 327.86 
Fisher Score 99.64 99.64 32.50 
Chi-Square Test 99.39 99.39 28.98 
Laplacian Score 99.31 99.30 313.06 
Statlog (Landsat New feature selection based on RBF kernel function with o = 1000 93.63 93.65 10.94 
satellite) New feature selection based on 1st polynomial kernel function 93.89 93.85 12.48 
Fisher Score 33.33 50.00 0.14 
Chi-Square Test 66.67 75.00 0.17 
Laplacian Score 80.95 75.00 1.97 
Wine New feature selection based on RBF kernel function with o = 0.0001 91.12 91.66 0.13 
New feature selection based on 2nd polynomial kernel function 91.12 91.66 0.13 
Fisher Score 100.00 100.00 0.27 
Chi-Square Test 100.00 100.00 0.36 
Laplacian Score 100.00 100.00 3.95 
Statlog (Vehicle New feature selection based on RBF kernel function with o = 1000 87.42 85.29 3.16 
silhouettes) New feature selection based on Sth polynomial kernel function 87.45 85.32 3.42 
Fisher Score 85.00 87.78 0.38 
Chi-Square Test 89.66 88.86 0.38 
Laplacian Score 87.47 83.32 5.08 
Housing New feature selection based on RBF kernel function with o = 1000 85.60 85.12 1.13 
New feature selection based on 4th polynomial kernel function 85.62 85.18 1.17 
Fisher Score 96.97 95.24 1.19 
Chi-Square Test 95.15 95.56 1.33 
Laplacian Score 98.74 99.14 14.72 
Machine New feature selection based on RBF kernel function with o = 0.0001 85.46 83.78 0.19 
New feature selection based on 3rd polynomial kernel function 85.14 83.41 0.19 
Fisher Score 94.77 94.42 0.52 
Chi-Square Test 94.41 94.20 0.52 
Laplacian Score 98.10 98.09 6.97 
Mammographic New feature selection based on RBF kernel function with o = 0.05 75.33 75.74 1.81 
mass New feature selection based on 1st polynomial kernel function 75.33 75.74 1.88 
Fisher Score 66.67 75.00 0.17 
Chi-Square Test 50.00 66.67 0.17 
Laplacian Score 71.67 74.63 1.81 
Seismic-bumps New feature selection based on RBF kernel function with o = 1000 80.85 73.05 1.11 
New feature selection based on 3rd polynomial kernel function 80.85 73.05 1.19 
Fisher Score 72.73 80.00 0.25 
Chi-Square Test 90.91 94.12 0.30 
Laplacian Score 78.13 82.93 2.02 
Cardiotocography New feature selection based on RBF kernel function with o = 1000 90.49 91.74 2.84 
New feature selection based on 4th polynomial kernel function 90.68 91.86 3.06 
Fisher Score 89.56 85.86 0.59 
Chi-Square Test 96.67 95.24 0.61 
Laplacian Score 92.09 91.67 4.38 
Forest type mapping New feature selection based on RBF kernel function with o = 1000 95.21 95.09 1.28 
New feature selection based on 2nd polynomial kernel function 95.83 95.54 1.34 
Fisher Score 96.63 96.30 0.59 
Chi-Square Test 94.89 93.45 0.59 
Laplacian Score 100.00 100.00 9.33 
Image segmentation New feature selection based on RBF kernel function with o = 1000 98.14 98.14 22.39 
New feature selection based on 8th polynomial kernel function 98.35 98.33 22.75 
Fisher Score 100.00 100.00 2.20 
Chi-Square Test 98.41 98.10 2.14 
Laplacian Score 98.72 98.67 22.53 
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5. CONCLUSION 

Feature selection is a crucial issue in machine learning, which makes users refuse to use the redundant 
features not correlated to the target of class in the dataset. There are two types of feature selection; however, 
it tends to filter, wrapper, or ensemble of both. In this paper, a new feature selection based on kernel function 
was introduced and applied to 16 real-world datasets from UCI data repository. K-means clustering 
was utilized as the classifier and only used 75 percent of the number of features that were sorted using 
this method. The performance was evaluated using RBF and polynomial kernel function with 10-fold 
cross-validation used to determine its accuracy and Fl-Score as the performance comparison. The running 
time was also examined as consideration and analyzed. 

From the experiments, it is concluded that when the new feature selection uses RBF kernel function, 
the performances varied according to the value of kernel parameter o. The majority performed its best when 
using the kernel parameter o=1000, while the feature selection based on polynomial kernel function was not 
affected by the use of the value of polynomial degree. In conclusion, the new feature selection based on RBF 
kernel function has a faster running time compared to the polynomial kernel function. For future work, 
the invention of new feature selection is still widely accessible for development. Other kernel functions 
and the evaluation techniques can be used for comparison. Moreover, utilize other classifiers can also 
be considered. 
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